The Partial Weighted Set Cover Problem with Applications to Outlier Detection and Clustering
نویسندگان
چکیده
We define the partial weighted set cover problem, a generic combinatorial optimization problem, that includes some classical data mining problems as special cases. We prove that it is computationally intractable and give a local search algorithm for this problem. As application examples, we then show how to translate clustering and outlier detection problems into this generic problem. Our experiments on synthetic and real-world datasets indicate that the quality of the solution produced by the generic local search algorithm is comparable to that obtained by state-of-the-art clustering and outlier detection algorithms.
منابع مشابه
Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملOnline Clustering and Outlier Detection
Clustering and outlier detection are important data mining areas. Online clustering and outlier detection generally work with continuous data streams generated at a rapid rate and have many practical applications, such as network instruction detection and online fraud detection. This chapter first reviews related background of online clustering and outlier detection. Then, an incremental cluste...
متن کاملOutlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means
One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کامل